IRIX Patches 1995 March

home *** CD-ROM | disk | FTP | other *** search

/ IRIX Patches 1995 March / SGI IRIX Patches 1995 Mar.iso / relnotes / patchSG0000254 / ch1.z / ch1

Wrap

Text File | 1995-03-10 | 32KB | 793 lines

- 1 - 1. _R_e_l_e_a_s_e__N_o_t_e_s__f_o_r__I_R_I_X__5_._2__p_a_t_c_h__S_G_0_0_0_0_2_5_4 This release note describes patch SG0000254 to IRIX 5.2. It contains the following information: +o Hardware/software platforms supported +o List of the bugs which are fixed by this patch +o Compatability considerations +o List of subsystems included in this patch +o Installation instructions 1.1 _H_a_r_d_w_a_r_e__p_l_a_t_f_o_r_m_s__s_u_p_p_o_r_t_e_d This patch to IRIX 5.2 supports the following machine types: +o Challenge and Onyx with R4400 processors +o Crimson (4D/510) +o PowerSeries (4D/120, 4D/2xx, 4D/3xx and 4D/4xx) 1.2 _S_o_f_t_w_a_r_e__p_l_a_t_f_o_r_m_s__s_u_p_p_o_r_t_e_d This patch will install on the following software platforms: +o IRIX 5.2 +o IRIX 5.2 for 200MHz IP19 (Challenge or Onyx, excluding Onyx Extreme) 1.3 _B_u_g_s__f_i_x_e_d__b_y__p_a_t_c_h__S_G_0_0_0_0_2_5_4 This patch contains fixes for the following problems which exist in IRIX 5.2 (bug numbers from SGI bug tracking system are included for reference): +o Multiprocessor systems acting as NFS servers can crash if multiple operations attempt to update the list of exported NFS filesystems simultaneously (bug 141828). +o It is possible for the system to loop forever printing the message "CPU x WARNING:tlbmiss: invalid badvaddr XXX". This could happen if there is an invalid reference in the kernel to kernel heap space. When this situation arose the system would have to be forcibly rebooted. The kernel has now been corrected to panic whenever such a situation arises so that the errant - 2 - code can be isolated (bug 189291). +o When dbx is started on a user VME driver which maps in VME bus memory via the mmap() system call, attempting to print the contents of that memory using dbx crashes the machine. This was true for the Challenge/Onyx machines as well as the older 4D series machines. This bug is fixed in this patch release. It is now possible for a user VME process to be run under dbx and view the variables in VME memory using dbx primitives like print. This allows users to examine the variables as bytes, half-words, or words depending on the type of VME memory. Trying to print a structure whose size is greater than the maximum access size supported by the VME board (in the case of D16 it could be 2 bytes, for D32 it could be 4 bytes, etc) still does NOT work. This is primarily due to VME bus access being size sensitive. A D16 board may only respond to 2 byte size access. Trying any other size access could cause problems for the board. As a result, when a user tries to print a structure, the kernel does not know what size operation is right for the VME address space in question. If the size is more than 8 bytes, it returns an error. (bug 189318). +o Lost clock interrupts on Power Series machines cause time to drift under a heavy system load. This patch provides a temporary workaround to the problem on machines equipped with an IO3 board. Machines which do not have an IO3 board installed will be uneffected by this patch (bug 192233). +o The _d_f(1) command can return a negative number as the count of blocks used on the /_p_r_o_c file system under some conditions (bug 193935). +o The IO4 serial driver (only applicable on Challenge and Onyx) has been modified for enhanced serial performance at sustained high baudrate. The interrupt priority of the duarts has been increased to prevent long interrupt masking under heavy loads from causing the duart hardware to drop incoming characters. As a side effect, it is now possible to program the duarts to interrupt less frequently, reducing the cost in cpu usage of heavy serial traffic. Users should see better performance at a lower cost. - 3 - Users are advised, however, that since the duart now interrupts at a very high priority, it is now possible to bring the entire machine to a halt by flooding it with serial traffic on a large number of ports at maximum baudrate. The machine may reach a state where it spends 100% of its time handling serial interrupts. Note that only the master cpu is actually tied up in this fashion, but other cpus may also be tied up waiting for the master cpu to release a needed resource. The high bandwidth capability and decreased cost are nullified if the _d_u_a_r_t__r_s_r_v__d_u_r_a_t_i_o_n timeout variable is configured to 0 at kernel build time, or if this value is reset at runtime with the SIOC_ITIMER _i_o_c_t_l command (see _s_e_r_i_a_l(7)). A 0 value in this case indicates that the user wishes the smallest possible latency receiving characters, and all of the tricks to improve high baudrade performance entail some latency, so they cannot be used in this case. The user can expect a significant performance penalty when setting this timeout to 0. (bug 200377) +o There is a bug in the gang scheduler that prevents priority from being observed between gangs on the same queue. Additionally, the batch gang queue can improperly run gangs even though the gang queue has valid work in it. (bug 200394). +o There is a bug in the code that keeps track of the IP multicast addresses that a host is accepting. Systems which use IP multicasting occasionally have some multicast addresses deleted when they are still in use or continue to listen for multicast addresses that are no longer in use (bug 201283). +o Multiprocessor Challenge and Onyx machines running IRIX 5.2 can hang as a result of a software deadlock (bug 204252). +o The IRIX Extent File System code in 5.2 has the property that files which are open and have been extended since the last time they were closed are likely to be lost when the system crashes for any reason. Changes have been made to the file system code in the kernel and to the file system check utility (_f_s_c_k) to reduce significantly the amount of data that is lost when the system crashes or loses power with extended application files still open. Note that there is still no guarantee that all writes done by applications will be preserved across a system crash. - 4 - The file system buffers writes and commits the data to disk asynchronously by design (bug 204253). +o Extending a file by writing to it on an NFS mounted file system was slower than it should have been because of incorrect interactions between the NFS server code and the file system on the server side (bug 204732). +o MP protection was added around automounter updates to /etc/mtab. +o The automounter no longer attempts to mount over an already mounted (root) child filesystem. The child filesystem will be the root, if the client is unable to mount the parent filesystem, due to permissions, timeouts, etc. (bug 172695). +o There is a race condition in the communication between the kernel and the local lock manager that can cause NFS to hang (bug 205438). +o There is a race condition in NFS client handle allocation that can cause NFS to hang under heavy loads on the client (bug 205453). +o On large memory systems, the kernel software previously had no throttle mechanism on the use of kernel virtual address space. Kernel virtual address space is used to map the kernel and its control data structures. Sometimes, EFS, NFS, raw I/O and other operations could cause the O/S to consume too much kernel virtual space to map file systems buffers. A new kernel variable "bmappedpct", dynamically tuneable, has been added to limit the % of "syssegsz" kernel virtual space allowed to be used by the file system buffers. When that value is exceeded, the system actively attempts to reclaim virtual address space. As shipped, this patch sets this value at 50%. Setting this tunable variable to "100" (or 100%) effectively disables this new control (bug 205422). +o Related to bug 205422, under some circumstances buffers associated with a logical volume could remain mapped for extended periods of time. Usually this would not cause a problem, but in cases where there is a limited amount of kernel virtual space available, this could make troubles worse. The fix included in this patch causes the logical volume driver to always unmap buffers it maps (bug 250335). - 5 - +o The logical volume driver has a race condition between the "open" and "ioctl" entry points. This patch includes a fix that serializes opens and ioctls on logical volumes. This fixes problems encountered by running multiple mklv commands on the same LV as the same time (bug 250334). +o A race condition exists between a process exiting and looking at that same process's credentials using the /proc interface. The /proc interface attempts to look at process's credentials after releasing a lock on the process entry. If the process exits within a few instructions the lock being released, then the /proc support can use an invalid pointer and panic the system. The solution in this patch holds the process entry locked until all credential information is copied (bug 249685). +o The profiling clock was running continuously on all processors, even when no profiling was in progress. This bug affects Challenge and Onyx only (bug 206673). +o Under certain loads, the system occasionally appears to be idle for periods up to 90 seconds, even when there are active jobs that should be running. This bug affects Challenge and Onyx only (bugs 193082 and 207844). +o Internet port numbers that can be automatically assigned were limited to 5000. This has been increased to 65535. +o Several software deadlocks that can cause the system to hang have been fixed (bug 208087). +o The normal diagnostics run at system powerup leave some error bits set in the hardware that were not being completely cleared by the operating system at boot time. This residual error state causes other hardware errors to be misdiagnosed. The kernel boot code now clears these error bits. This bug affects Challenge and Onyx only (bug 209406). +o There is an error in the system audit trail mechanism that can cause the system to crash when handling pathnames of certain formats (bug 212708). +o There is an error in the gang scheduler that can, under certain circumstances, cause a gang to starve if it has a large number of processes associated with it (bug 214170). - 6 - +o Multiprocessing EVEREST systems failed to start all processors for different software configurations. This problem was most notable with a kernel linked for debugging, and no symmon available at boot time. Usually only the master CPU would boot, with all slaves failing to start. The problem has also been seen on non-debug versions of the kernel. The problem was caused by a race condition between the master CPU and the slaves during the early boot process. The appropriate synchronization is implemented in this patch (bug 214364). +o Using a regular file in a file system as a supplementary swap area can cause the system to crash during heavy swapping (bug 214374). +o The combination of heavy outbound network traffic using large buffers (as is done by doing ftp puts, for example) and heavy page aging by the virtual memory system when free memory is low can cause a multiprocessor system to hang in the page flipping code (bug 216587). +o The disk quotas facility did not work in the previous IRIX 5.2 patches SG0000001 and SG0000022. This has been fixed in this patch release. +o There is a performance problem related to the creation of sproc children that have local mappings that results in excessive rfault rates for the child. This occurs when the parent has the local mapping already in its address space when sproc is called (bug 222221). +o A particular POSIX conformance bug that caused the updating of file access time for a read on a file on a read only file system has been fixed. (bug 223286). +o Another POSIX conformance bug whereby a fcntl to dup a file descriptior a number of times so as to exceed the user's allowable number of open file descriptors returned error EMFILE has been changed to return EINVAL. (bug 223492). +o Still another POSIX conformance bug that has been closed in this patch is the fact that if a process which handles SIGCONT is sleeping inside a system call at an interruptible priority, the sending of a job control stop signal will interrupt the system call, returning -1 and setting errno to EINTR. (bug 223509). - 7 - +o Sproc processes that exec may carry with them wrong pages from parent; potentially causes the execed process to hang (bug 227235). +o Data base systems using asynchronous I/O could corrupt data. However, only SYBASE version 10 is known to trigger this problem (bug 229896). +o Support for 4MB secondary cache systems in both IRIX and IO4 prom. +o The IO4 prom image also incorporates new segment loader software, and multiple versions of the IO4 software for different architectures. In particular, this version of the IO4 prom supports both IP19 and IP21 CPU boards. Some Scsi initialization and time out values were changed to recover from the system attempting to boot from a disk that is not "ready" yet. +o The IO4prom did not allow booting from a SCSI disk that was above address 7 on the SCSI bus (bug 240879). +o A new feature was introduced in patchSG0000022 for EVEREST systems only (Challenge and Onyx with R4400 processors). For a certain class of memory errors, recovery is possible in software because the data lost is no longer required. This feature was disabled by default, but is now enabled in this patch. For example, errors which occur when zeroing a new page for a task may be safely ignored, since the previous data on the page is no longer needed. The kernel variable "ecc_recover_enable" enables and disables this recovery feature. A value of 0 indicates that recovery should not be attempted. A non-zero value represents the number of seconds over which 32 error recovery attempts can be made. In general, a value of 60 should be used to enable this feature. This is the value that is now enabled by default. +o A bug that occured as part of patch 33 where on EVEREST machines the NOINTR directive could be ignored, causing problems with real time latency has been corrected. (bug 235061) +o Power Series and Crimson systems with dual VME buses did not support user mode access (/dev/vme) properly. A16 mode on the second bus did not work, and A32 did not work on either bus. These problems are corrected. - 8 - +o VME write error handling for Challenge/Onyx systems was not taking care of corner cases where a VME write error followed by a VME read error would cause the systems to crash in certain situations (bug 231142). +o Fixed the problem where stressing a Power series system with multiple ethernets (et0, and enp0) would cause the network subsystem to hang. This was also causing SCSI subsystem to hang (bug 188296). +o System calls stat() or xstat() on a tty file can hang in kernel mode, leaving the process unkillable (bug 230375). +o Multiprocessor systems with R4000 cpus or R4400 cpus at revision level 2.2 or less and which use loadable drivers can crash due to a kernel segmentation violation. Such systems with loadable drivers can panic with the cause being an RMISS and the bad_addr not matching the faulting pc. The workaround installed in the kernel detects this particular path when it is caused by an R4000 bug and allows the operation to be retried, which is needed to correct the problem (Bug #236338). +o Changes were put into the kernel which allow the kernel stack to be increased by an additional page for real- time processes and otherwise on an as-needed basis. This increases the reliability of the system by eliminating scenarios in which the kernel stack might overflow and panic the system, which occasionally arose in systems making heavy use of remote file systems, for example (Bug #240710). +o Fixed a problem where a binary compiled on IRIX 4.0.5 that uses libc function getcwd() will fail on an IRIX 5.2 machine that has a raid filesystem when the binary is run on the raid filesystem (bug 234992). +o Under certain circumstances, mail could experience a deadlock in accessing its lock file in an nfs-mounted mail directory. This fix makes it possible for users to have nfs-mounted mail directories (bug 228720). +o A user-written program which attempted to read /dev/mem or /dev/kmem could cause the system to crash (bug 189764). This problem is now resolved. +o Writing to a named pipe over nfs could cause a system panic. This problem could occur running previous Irix 5.2 patches patchSG0000022, patchSG0000030, - 9 - patchSG0000033 or patchSG0000047, all of which are replaced by this patch release. +o The fuser command could cause a system panic when used on a machine with heavy socket creation/deletion activity (bug 209242). +o Irix 5.2 patchSG0000047 introduced a problem whereby an EFast ethernet board would not be seen on the VME bus of a PowerSeries (4D/120, 4D/2xx, 4D/3xx and 4D/4xx). That problem is fixed in this patch release. +o When TCP connections are being created at a high rate, a system panic may occur with message "soaccept !NOFDREF" (bug 249206). This fix avoids the race between accept() and tcp_drop(). +o When TCP connections are being created at a high rate, connections may time out even though the server is largely idle, due to the backlog limit on the server's initial connection socket being limited to a small value (bug 245976). This change allows the maximum backlog value to be reconfigured, by modifying the variable somaxconn in /var/sysgen/master.d/bsd. +o When remote TCP clients disappear forever (where the client systems do not respond to pings), with connections open and data queued for output, after the local server has closed the connection, but before all the data has been delivered and acknowledged, the TCP socket is left in the kernel indefinitely, even if the server set the SO_KEEPALIVE option (bug 248935). This eventually uses up all available network buffer space. This change adds a new kernel variable, tcp_keep_timer_in_close, which may be set to a non-zero value to permit SO_KEEPALIVE timeouts to act on such sockets. The variable must presently be set using dbx or some other such program which permits modification of kernel variables. +o The system accounting programs sar(1) and sadc(1M) fail (dump core) when the system contains a large number of disk partitions (bug 214394). This patch removes limitations in these programs on the number of disk partitions configured on a system. +o Heavy disk usage involving 2 or more processes repeatedly reading the same section of disk, or the same file, could cause extremely show response for any other process. A scheduling change has been made that prevents such processes from effectively blocking out - 10 - other running processes (bug 237460). +o A kernel scheduler panic could happen in a specific case of schedctl() changing a job priority from below 80 to above 80. This problem has been fixed by adding retry logic in pugDuty() (bug 240971). +o On an MP system, a process can change the priority of another process which could be running on another CPU. This bug fix will allow it to do so without running into any kernel stack extension page mismatches (bug 252308). +o Temporary workaround for PROMs - current PROMS do not support port numbers with the sign bit set. The workaround is to limit the port numbers of anonymous connections to 32767. Problem was originally found in using tftpd, when tftp could not be used on a 5.3 server (bug 231136). 1.4 _C_o_m_p_a_t_a_b_i_l_i_t_y__c_o_n_s_i_d_e_r_a_t_i_o_n_s This patch includes slight content changes to the following system header files as a part of the fix to prevent any possible kernel stack overlow: +o "/usr/include/sys/param.h" +o "/usr/include/sys/proc.h" These changes were made in a way to minimize compatibility concerns, but it is still possible that software sensitive to the exact kernel proc struct may need to be rebuilt. For this reason, sites using CASEVision(m/ClearCase must rebuild the MFS (MVFS for ClearCase 2.0 users) after installing patchSG0000125, patchSG0000139 or patchSG0000254 . If these steps were taken after installing patchSG0000125 or patchSG0000139, they do not have to be repeated after installing patchSG0000254. As the root user, execute the following instructions and then reboot. The CPUBOARD value IPxx may be determined from the _h_i_n_v(1M) command. If you are running ClearCase 1.1.4: %%%% ssssuuuu #### sssseeeetttteeeennnnvvvv CCCCPPPPUUUUBBBBOOOOAAAARRRRDDDD IIIIPPPPxxxxxxxx #### ccccdddd ////vvvvaaaarrrr////ssssyyyyssssggggeeeennnn////bbbbooooooootttt #### mmmmaaaakkkkeeee ----ffff ////vvvvaaaarrrr////ssssyyyyssssggggeeeennnn////MMMMaaaakkkkeeeeffffiiiilllleeee....kkkkeeeerrrrnnnniiiioooo mmmmffffssss____ppppaaaarrrraaaammmm....oooo #### mmmmvvvv mmmmffffssss....oooo mmmmffffssss....oooo....oooolllldddd #### lllldddd ----oooo mmmmffffssss....oooo ----rrrr pppprrrreeeemmmmffffssss....oooo mmmmffffssss____ppppaaaarrrraaaammmm....oooo #### ////eeeettttcccc////aaaauuuuttttooooccccoooonnnnffffiiiigggg ----ffff - 11 - If you are running ClearCase 2.0: %%%% ssssuuuu #### sssseeeetttteeeennnnvvvv CCCCPPPPUUUUBBBBOOOOAAAARRRRDDDD IIIIPPPPxxxxxxxx #### ccccdddd ////vvvvaaaarrrr////ssssyyyyssssggggeeeennnn////bbbbooooooootttt #### mmmmaaaakkkkeeee ----ffff ////vvvvaaaarrrr////ssssyyyyssssggggeeeennnn////MMMMaaaakkkkeeeeffffiiiilllleeee....kkkkeeeerrrrnnnniiiioooo mmmmvvvvffffssss____ppppaaaarrrraaaammmm....oooo #### mmmmvvvv mmmmvvvvffffssss....oooo mmmmvvvvffffssss....oooo....oooolllldddd #### lllldddd ----oooo mmmmvvvvffffssss....oooo ----rrrr pppprrrreeeemmmmvvvvffffssss....oooo mmmmvvvvffffssss____ppppaaaarrrraaaammmm....oooo #### ////eeeettttcccc////aaaauuuuttttooooccccoooonnnnffffiiiigggg ----ffff Note: If you remove patchSG0000125, patchSG0000139 or patchSG0000254, you need to perform these same steps. 1.5 _S_u_b_s_y_s_t_e_m_s__i_n_c_l_u_d_e_d__i_n__p_a_t_c_h__S_G_0_0_0_0_2_5_4 This patch includes changes to the following IRIX 5.2 products: _c_o_m_p_i_l_e_r__d_e_v, _e_o_e_1, _e_o_e_2, _n_f_s and _d_e_v. The patchSG0000254 image contains the following subsystems: +o patchSG0000254.compiler_dev_sw.dbx +o patchSG0000254.dev_hdr.lib +o patchSG0000254.eoe1_sw.quotas +o patchSG0000254.eoe1_sw.unix +o patchSG0000254.eoe2_sw.audit +o patchSG0000254.eoe2_sw.kdebug +o patchSG0000254.eoe2_sw.perf +o patchSG0000254.nfs_sw.nfs 1.6 _I_n_s_t_a_l_l_a_t_i_o_n__i_n_s_t_r_u_c_t_i_o_n_s This patch is only installable on systems running IRIX 5.2. This patch requires installation in miniroot mode. To perform the installation, take the system down and follow the normal procedures for starting up the installation tool from the supplied release media. It is recommended that you select all the patch subsystems that correspond to software already installed on the system. This patch will install on systems running IRIX 5.2, or on Challenge or Onyx systems with the 5.2-200MHz release installed to support IP19 200MHz CPU boards. In the case of installing on the 5.2-200MHz release, inst will note an apparent version mismatch for the subsystem patchSG0000254.eoe1_sw.unix, as noted by: - 12 - kkkk NNNN ppppaaaattttcccchhhhSSSSGGGG0000000000000000222255554444....eeeeooooeeee1111____sssswwww....uuuunnnniiiixxxx @@@@ 0000 11114444666666665555++++ IIIIRRRRIIIIXXXX EEEExxxxeeeeccccuuuuttttiiiioooonnnn EEEEnnnnvvvviiiirrrroooonnnnmmmmeeeennnntttt For correct installation of patchSG0000254, it is necessary to issue the following inst command: sssseeeetttt nnnneeeewwwweeeerrrroooovvvveeeerrrrrrrriiiiddddeeee oooonnnn in order to force inst to install patchSG0000254.eoe1_sw.unix. One way in which software patches differ from full releases and maintenance releases is that patches are reversible: you can remove the patch and restore the installed software to its state before the patch was applied. This is done by using the _v_e_r_s_i_o_n_s command as superuser: vvvveeeerrrrssssiiiioooonnnnssss rrrreeeemmmmoooovvvveeee ppppaaaattttcccchhhhSSSSGGGG0000000000000000222255554444 Since this patch replaces some kernel object files, it is necessary to rebuild the kernel image and reboot after removing the patch: aaaauuuuttttooooccccoooonnnnffffiiiigggg rrrreeeebbbbooooooootttt